Audio signal analysing method and apparatus

ABSTRACT

A method for determining the key of an audio signal such as a music track. Portions ( 106 ) of the audio signal are analysed ( 104 ) to identify ( 108 ) a musical note and its associated strength ( 110 ) within each portion. Some notes identified in a portion may be ignored ( 118 ) to enable notes related to the key to be more readily distinguished. A first note is then determined ( 124 ) from the identified musical notes as a function of their respective strengths. From the identified musical notes, at least two further notes are selected ( 128 ) as a function of the first note. The key of the audio signal is then determined ( 130 ) based on a comparison of the respective strengths of the selected notes.

The present invention relates to a method and apparatus for determininga feature of an audio signal, in particular the musical key.

With the advent of cheaper storage and access to the Internet, consumerscan access and accumulate vast amounts of information and contentincluding video, audio, text and graphics. There is a recognised needfor classification in order to facilitate search and access of suchcontent by consumers. In an audio context, classification may beperformed on the basis of music genre, artist, composer and the like.These classifications however may be limiting where selection is on thebasis of mood or other emotionally-specific criteria. For exampleromantic music can be considered to span a range of composers andmusical styles within classical, popular and other musical traditions.Emotional music may be characterised in terms of its inherent musicalfeatures including level, tempo and key, each of which is independent ofa specific genre, composer or similar classification.

In U.S. Pat. No. 5,038,658 to Tsuruta et al, an automatic musictranscription method and apparatus capable of determining the key ofacoustic signals is disclosed. A disadvantage of the method employed isthe need to perform multiple segmentation of the acoustic signal inorder to determine musical intervals necessary to determine the key,including segmentation on the basis of changes in the obtained powerinformation, on the basis of standard note lengths and on the basis ofwhether or not the musical interval of the identified segments incontinuum are identical. A further disadvantage of the method is theneed to extract the pitch information in the time domain by means ofautocorrelation.

In paper “Querying Large Collections of Music for Similarity” (Welsh etal, UC Berkeley Technical Report UCB/CSD-00-1096, November, 1999), asystem capable of performing queries against a large archive of digitalmusic is presented using a technique based on a set of featureextractors which pre-process a music archive. One feature extractorproduces a histogram of frequency amplitudes across notes of a musicscale, each bucket of the histogram corresponding to the averageamplitude of a particular note (e.g. C sharp) across 5 octaves for thesample of music analysed. It is stated that this information can be usedto help determine the key that the music was played in, however a methodis not disclosed. A further disadvantage of the approach is a potentialdifficulty to discriminate from the averaged note data those notes thatare related to the key of the music.

It is an object of the present invention to improve on the known art.

In accordance with a first aspect of the invention there is provided amethod for determining the key of an audio signal, the method comprisingthe steps of:

-   -   for each of a plurality of signal portions, analysing the        portion to identify a musical note, and where at least one        musical note is identified:        -   determining a strength associated with the or each musical            note; and        -   generating a data record containing the identity of the or            each musical note, the strength associated w ith the or each            m usical note and the identity of the portion;    -   for each of the data records, ignoring the strength associated        with an identified musical note where said strength is less than        a predetermined fraction of the maximum strength associated with        any identified musical note contained within the data records;    -   determining a first note from the identified musical notes as a        function of their respective strengths;    -   selecting at least a second and a third note from the identified        musical notes as a function of the first note; and    -   determining the key based on a comparison of the respective        strengths of the at least second and third notes.

In accordance with a second aspect of the invention there is provided anapparatus for determining the key of an audio signal, the apparatuscomprising:

-   -   an input device operable to receive a signal;    -   a data processing apparatus operable to:        -   for each of a plurality of signal portions, analyse the            portion to identify a musical note, and where at least one            musical note is identified:            -   determine a strength associated with the or each musical                note; and            -   generate a data record containing the identity of the or                each musical note, the strength associated with the or                each musical note and the identity of the portion;        -   for each of the data records, ignore the strength associated            with an identified musical note where said strength is less            than a predetermined fraction of the maximum strength            associated with any identified musical note contained within            the data records;        -   determine a first note from the identified musical notes as            a function of their respective strengths;        -   select at least a second and a third note from the            identified musical notes as a function of the first note;            and        -   determine the key based on a comparison of the respective            strengths of the at least second and third notes.

Owing to the invention it is possible to determine the key of an audiosignal in an efficient and accurate manner. The audio signal may be adigital or analogue recording of a piece of music.

Preferably each portion is the same size, and each portion encompassesthe same length of time. Advantageously the size of the portion is afunction of the tempo of the audio signal. The portions may becontiguous. Preferably, the predetermined fraction is determined independence on the content of the audio signal. Ideally, thepredetermined fraction lies in the range of one tenth to one half, witha preferred embodiment of the predetermined fraction being one seventh.

Advantageously, the step of analysing the portion to identify a musicalnote comprises the steps of:

-   -   converting the portion to a frequency domain representation;    -   subdividing the frequency domain representation into a plurality        of octaves;    -   for each octave containing a maximum amplitude:        -   determining a frequency value at the maximum amplitude; and        -   selecting a note name of a musical scale in dependence on            the frequency value;    -   and    -   identifying a musical note in dependence on the same note name        being selected in more than one octave.

In this embodiment, the conversion of the portion to a frequency domainrepresentation is preferably performed by means of a Fourier Transform.The musical scale is ideally the Equal Tempered Scale.

In a preferred embodiment, the step of determining a strength associatedwith the musical note comprises the steps of:

-   -   determining the amplitude of each frequency component of the        musical note; and    -   summing the amplitudes.

Advantageously, the step of determining the first note comprises thesteps of:

-   -   for each identified musical note, summing the strengths        associated with the musical note in the data records; and    -   determining the first note to be the identified musical note        with the maximum summed strength.

In a preferred embodiment, the first note is the tonic of the key.

An advantage of the present invention is that portions of the audiosignal used for analysis may be selected arbitrarily and such selectionis thus independent of the content of the audio signal. Furthermore, themethod of the invention relies on detecting the presence of musicalnotes which are related to the key of the audio signal, preferablydetecting notes originating from a particular type of musical source(e.g. instrument). Advantageously, determining the timing and durationof musical notes is not relevant to the method. A further advantage isthat filtering is applied to eliminate contributions from irrelevantnotes (and noise) which otherwise confuse the process of determining theidentities of the notes of interest. Moreover, the method of theinvention is suitable for implementation in low cost hardware and/orsoftware thereby enabling deployment in high volume consumer products.

Embodiments of the invention will now be described, by way of exampleonly, with reference to the accompanying drawings in which:

FIG. 1 is a flow diagram of a method for determining the key of an audiosignal;

FIG. 2 is a flow diagram of a step in the method of FIG. 1 for analysinga portion of the audio signal;

FIG. 3 a is a series of graphs showing an example of a frequency domainrepresentation of a portion of the audio signal;

FIG. 3 b is a table showing a set of data records corresponding toportions of the audio signal including the portion represented in FIG. 3a;

FIG. 4 a is a table showing a set of data records corresponding toportions of the audio signal;

FIG. 4 b is a table showing total strengths associated with identifiednotes as derived from the data within the table of FIG. 4 a; and

FIG. 5 is a schematic representation of an apparatus for determining thekey of an audio signal.

FIG. 1 shows a flow diagram of a method for determining the key of anaudio signal. Typically, the audio signal is received by an input device(510, FIG. 5) of an apparatus (500, FIG. 5) which carries out thismethod. The method, shown generally at 100, starts at 102 and analyses104 a portion of the audio signal to identify a musical note (asdescribed in more detail below). Preferably, the key is determined usingidentified bass musical notes. These notes can be characterised by theirfundamental components residing within the bass register and having oneor more harmonically related frequency components, the componentscorrelating with a recognised musical scale. Such notes may be soundedby a pitched instrument (that is, an instrument which can sound one ormore notes according to a musical scale), for example a bass guitar ordouble bass. Where at least one musical note has been identified 108 forthe portion, the method then determines 110 a strength associated withthe musical note or notes. The strength is determined as a function ofthe amplitude of one or more frequency components of the identifiedmusical note. Once the strength associated with each musical note withina portion has been determined, a data record 120 is generated 112comprising the identity of the musical note or notes, the strengthassociated with each musical note and the identity of the portion. Themethod then checks 116 to ensure that steps 104, 108, 110 and 112 areperformed for all portions 106 of the audio signal that are to beprocessed. It is to be noted that the portions may encompass only partof the total received audio signal and that the portions may or may notbe contiguous. Each data record 120 of the resulting set 114 of datarecords is reviewed in order to ignore 118 any strength within therecord which is less than a predetermined fraction (e.g. one seventh) ofthe maximum strength associated with any identified musical notecontained in any record within the set of data records. Such strengthscan be deleted 122 from the data records. The purpose is to filter outthose note strengths which may affect the discrimination of notes withinthe audio signal which are related to the key. Next, the methoddetermines 124, using filtered data 126, a first note from theidentified musical notes as a function of their respective strengths.Then, at least a second and a third note are selected 128 from theidentified musical notes as a function of the first note, a gain usingfiltered data 126. The notes selected depend on the musical scaleemployed in the analysis. Preferably, the Equal Tempered Scale is used.For this scale system, the first note would represent the tonic of thescale and the second and third notes could respectively representalternative interval notes, each corresponding to the major and minormodes of the key. Additional notes may be selected depending on themodality of the key to be determined. The key is then determined 130based on a comparison of the respective strengths of at least the secondand third notes. The method ends at 132.

FIG. 2 shows a flow diagram describing in greater detail the step 104 inthe method of FIG. 1 for analysing a portion of the audio signal. Themethod starts at 202 and proceeds to convert 204 the portion to afrequency domain representation. Any suitable means of conversion may beused; preferably, the conversion is performed by means of a FourierTransform. Next, the frequency representation is subdivided 206 into anumber of octaves since musical scales can be constructed using octaves.Any suitable musical scale may be employed; preferably the EqualTempered Scale is used since this musical scale is commonly the basis ofmany music genres and styles. A maximum amplitude frequency component issearched for within each octave. Where such a maximum exists thefrequency value at the maximum amplitude is determined 208. A note nameof a musical scale (for example, the Equal Tempered Scale) is thenselected 210 according to the determined frequency value. The determinedfrequency value should correspond exactly to, or at least within apredetermined range (e.g. +/−10%) of, the reference frequency value of amusical scale note with a specified note name.

The particular predetermined range chosen may be dependent on thefrequency tolerance of the musical notes within the audio signal; thefrequency tolerance in turn may be influenced by for example the musicalsource or sources not being in tune with the reference tuning of themusical scale. The difference in tuning can be measured and thepredetermined range chosen accordingly to compensate. Distortions canoccur in the path from the musical sources to the key determining methodor apparatus. Types of distortion in the path include wow and flutter,data corruption and noise. As such distortions may vary with time, anominal predetermined range such as +/−10% could be chosen or a morecomplex scheme might be employed to continuously measure the distortionand dynamically adapt the predetermined range.

A note name of a musical scale describes all notes related in terms ofoctave multiples (that is, notes with the same name are harmonicallyrelated); a specific note within a scale may be characterised by a notename and a particular octave. The method checks 212 to ensure all theoctaves of the frequency domain representation of the portion areprocessed by steps 208 and 210. Note names selected in the octaves arethen compared 214; where two or more same note names occur they aredeemed to identify 216 a musical note. This is because musical sourcessuch as vocalists and instruments can produce sounds characterised by aset of frequency components which are harmonically related; that is, thefrequency components of a note sounded by such a musical source arepositioned at multiples of one another. The method ends at 218.

It will be evident to the skilled person that the method may potentiallyidentify none, one or more musical notes for a portion. In the casewhere the frequency domain representation of a portion is subdividedinto a number of octaves, the ability to identify more than one musicalnote is dependent on the number of octaves into which the frequencydomain representation of a portion is subdivided; two or three octavescan identify up to one musical note; four or five octaves can identifyup to two musical notes, and so on. The range of notes produced by amusical source may influence the number of octaves the frequency domainrepresentation of a portion should be subdivided into. As an example, anaudio signal may comprise musical notes residing within the frequencyrange 27 Hz to 4.1 kHz (e.g. a pianoforte capable of sounding notes fromA0 to C8 of the Equal Tempered Scale). In this example, the method wouldsubdivide the frequency domain representation of a portion of the audiosignal into, say, at least one or two further octaves (e.g. 11 octavesin total—octaves 0 to 10 of the Equal Tempered Scale) in order toidentify the high pitch notes of the piano. However, this holisticapproach is unnecessary for the purpose of key determination and asubset of octaves is preferably used. For example a musical source witha particular register may be used to determine the key. Preferably, theaudio signal comprises bass notes and the method can subdivide thefrequency domain representation of a portion of the audio signal intofive octaves (for example, octaves 1 to 5 of the Equal Tempered Scale)in order to identify the bass notes.

FIG. 3 a is a series of graphs showing an example of a frequency domainrepresentation 300 of a portion of the audio signal. The frequencydomain representation is subdivided into a number of octaves. In FIG. 3a five amplitude-frequency graphical representations 301, 302, 303, 304,305 are shown, each representing one octave in scale (logarithmically inthe horizontal frequency axis). The octaves are chosen such that theyencompass a range of frequencies in which suitable components of thesounded musical notes, if present in the portion, will reside.Preferably, bass musical notes are to be identified; therefore, suitableoctaves include those which encompass the fundamental and harmoniccomponents of notes produced by bass instruments, for example in thecase of the Equal Tempered Scale, octave numbers 1 to 5. The amplitudeoutline of frequency components of the portion within each octave areshown as 306, 308, 310, 312, 314. Each of these outlines is reviewed todetect a maximum (if present). In the example shown, each octave has amaximum, shown at 316, 318, 320, 322, 324 respectively. In FIG. 3 a,each amplitude-frequency graphical representation 301 to 305 is arrangedto cover the same note sequence for one octave of the Equal TemperedScale; for example the frequency value (in an octave) for note C lies atthe origin, with the frequency axis scale covering one octave. Maxima316, 320 and 324 all relate to the same note name, E, as depicted byline 326 which represents the same note name (E) common to all theoctaves (since each octave is depicted using a logarithmic frequencyaxis and the representations 301-305 being arranged vertically asshown). Therefore, note E occurs (i.e. is a maximum frequency component)in more than one octave (actually three octaves). Note E is thereforedeemed to be identified. A strength associated with the identified noteE is then determined by summing the amplitudes of frequency componentsin octaves in which the note name corresponds to the maximum amplitude.In the present example, the strength comprises the sum of the amplitudevalues e1, e3, e5 of the relevant (maximum) frequency components of thenote in the respective octaves. Reviewing the other octaves, it can beseen that there is no same note correspondence of maxima 318 and 322,these being respectively a frequency component of note D (with amplituded2) and a frequency component of note A (with amplitude a4).

FIG. 3 b shows a table containing a set of data records corresponding toportions of the audio signal including the portion represented in FIG. 3a. A set of data records 327 is created during the analysis of portionsof the audio signal. Each record includes fields to identify the note328, a strength 330 associated with the note and the portion 332 inwhich the note was identified. As previously discussed, more than onenote may be identified within a portion; FIG. 3 b provides such anillustration in the case of data records for the portion numbered 2. Adata record for the portion represented in FIG. 3 a is shown andincludes the identity 334 of the identified note, the calculatedstrength 336 associated with the note and the identity 338 of theportion.

Considering the example where notes are identified within the fiveoctaves 1 to 5 of the Equal Tempered Scale, it is likely the strongestidentified musical note occurring in any portion is due to:

-   -   a) a bass note having components with significant amplitudes in        most of the five octaves, and/or b) a higher pitched note with        large amplitude components in the upper octaves (e.g. octaves 4        and 5).

Suitable selection of portion size may help to discriminate betweenthese notes. As portion size increases, the number of identifiable noteswithin a portion may increase. Recalling that to identify more than onemusical note for a portion depends on the number of octaves into whichthe frequency domain representation of that portion is subdivided, thenfor a given number of octaves, a larger portion size reduces the abilityto identify all the musical notes that are present. Conversely, in orderto minimise the influence of strong notes in the higher part of the bassregister (e.g. octaves 4 and 5), the portion size should suitably beselected such that bass notes and strong higher notes may less oftenoccupy the same portion. The size of portions may be variable or fixed.An advantage of using a fixed portion size is a reduced processingrequirement (resulting in faster execution). Preferably, each portion isthe same size, for example each portion encompasses the same length oftime. Selection of portion size can be a function of the tempo (beatrate) of an audio signal. Where the tempo is unknown, portion size mightbe selected as a function of the maximum expected tempo, for example 240beats per minute. It may be further refined by assuming a maximum numberof distinctly played notes per beat, such as two notes per beat. Forexample, an audio signal comprising 44100 samples per second might beanalysed in portions each having a size of 5512 samples representing oneeighth of a second which corresponds to a tempo of 240 beats per minutewith a maximum of two distinctly played notes (i.e. quavers) per beat.In this example, for convenience the portion size might be rounded downto 5000 samples.

FIG. 4 a is a table showing a set of data records corresponding toportions of the audio signal. A data record 402 includes fields toidentify the portion in which one or two notes were identified and thestrength associated with each note. Data record 404 relates to portion 1and identifies one note (E) with an associated strength (30). Similarly,data record 406 relates to portion 4 and identifies two notes (C and Fsharp, F#) with associated strengths (100 and 10 respectively).

The set of data records comprises records for a number of portions, eachdata record comprising note and strength data for a particular portion,as discussed. The method now filters out certain identified musicalnotes within the data records, for example by ignoring the strengthassociated with a note of a portion which is less than a predeterminedfraction of the strongest identified musical note occurring in anyportion. The filtering helps to emphasise for example stronger noteswithin the audio signal, such notes tending to be more related to thekey. In the example case where bass notes are identified, an ignoredstrength associated with a note of a portion may include a note havingrelatively little bass content (for example only having contributionswithin the higher octaves of the frequency domain representation of theportion) or a note with relatively low bass level such that it makeslittle overall contribution (e.g. a relatively quiet note, or noise).The predetermined fraction may lie in the range of one tenth to one halfof the strongest identified note of any portion. The predeterminedfraction can be determined in dependence on the content of the audiosignal, for example a first piece of music having more instrumentsplaying within the bass register (compared to a second piece of music)may require different filtering (fraction) compared to the second piece.The predetermined fraction selected may be dependent on a music genre;for example a suitable predetermined fraction for popular music is oneseventh. Preferably, one seventh is used as the default value for thepredetermined fraction. In cases where the default value of one seventhgives poor results in terms of determining the key, alternativefiltering might be performed using a different fraction value. Selectionof a suitable fraction value can be made empirically or based accordingto the content or genre of the audio signal as discussed above.

In the example of FIG. 4 a, the audio signal is known to be popularmusic and so the predetermined fraction of one seventh is used. Themaximum strength in the set of data records 400 is 100 (the strength 410associated with the identified note C in portion 4). Therefore strengths414, 416, 418, 420 within the set of data records 400 are each less than100/7 and will be ignored in subsequent processing, for example by beingdeleted (not shown in FIG. 4 a) from their respective data record withinthe set of data records 400. A first musical note is then determinedfrom the identified notes as a function of their respective strengths.An example may comprise taking the strengths of the identified notes ofeach portion having the same note name and calculating the totalstrength of each identified note of the musical scale across all theportions.

FIG. 4 b is a table showing total strengths associated with identifiednotes as derived from the data within the table of FIG. 4 a. Each totalstrength calculated corresponds to one of the twelve notes 452 of thechromatic scale of the Equal Tempered Scale. The identified note havingthe highest total strength is deemed to be the first note (which in thisexample is the tonic) related to the musical key of the audio signal.Second and third notes are selected by their relation to the tonic suchthat their relative strength indicates whether the mode of the key ismajor or minor. For example, for the scale of which the tonic is the keynote, the 3^(rd) step (interval) of the scale may be examined. Where theanalysed portions of the audio signal are mainly in a major key therewill be stronger occurrences of the 4^(th) semitone up from the tonic(for example, where the tonic is the note C, the 4^(th) semitone of Cmajor is the note named E natural). Alternatively, where the analysedportions of the audio signal are mainly in a minor key there will bestronger occurrences of the 3^(rd) semitone up from the tonic (forexample, where the tonic is the note C, the 3^(rd) semitone of C minoris the note named D sharp, D#). Therefore, for the present example,comparing the relative total strengths of identified notes at the 4^(th)and 3^(rd) semitone up from the tonic should indicate whether the key ismajor or minor (for the key of C, comparing identified notes E and D#).Alternative notes could be examined to determine major and minorincluding notes of the 6^(th) interval (for example, for the key of C,comparing identified notes A natural and G sharp, G#). In FIG. 4 b,identified note C 454 has the highest total strength 466 (comprising theaddition of strengths 408, 410, 412) and is therefore deemed to be thefirst note (and tonic). Other identified notes, as contained in the setof data records 400, comprise notes 456, 458, 460, 462, 464, withcorresponding (filtered) strengths 468, 470, 472, 474, 476. It can beseen that, for example, the total strength 470 of note 458 excludes thecontribution 420 since this is considered to be an irrelevant note ornoise and is therefore filtered out (ignored). As discussed above,further identified notes are then selected as a function of the tonic,for example the 3^(rd) and 6^(th) musical intervals. The method selectsidentified musical notes 456, 478 (or alternatively 464, 480)corresponding to the 3^(rd) (or 6^(th)) musical intervals based on thetonic. A comparison of the total strength 468, 482 (or alternatively476, 484) of each selected identified musical note is used to determinethe major or minor mode of the musical key of the audio signal. In theexample of FIG. 4 b, the tonic of the key is C (largest total strengthof 160); comparing the total strengths 468 and 482 of the respectivemajor and minor 3^(rd) interval notes 456 and 478, it can be determinedthat the key is C major. It is to be observed that a key may have amodality of a type which requires the selection of additional oralternative identified notes to those described in order to fullydetermine the mode of the key.

FIG. 5 is a schematic representation of an apparatus, shown generally at500, for determining the key of an audio signal. The apparatus comprisesan input device 510 which is used to receive an audio signal. The inputdevice might include an interface to read physical media (magnetic tape,magnetic or optical disc, etc.) or perhaps to interface to a wiredand/or wireless network, thereby enabling access to local and remotenetwork sources, including Internet sources. In particular, examples ofsuitable wired systems include Ethernet, RS232 and USB; examples ofsuitable wireless systems include WiFi, 802.11b, Low Power radio andBluetooth. The audio signal may comprise any suitable analogue ordigital format. The received audio signal may be baseband or modulated.Examples of suitable digital audio signal formats include AES/EBU, CDaudio, WAV and AIFF. The input device may perform processing in order topresent the audio signal in a form suitable for the data processingapparatus 502 section of the apparatus. The apparatus also comprises aCPU 504, program ROM 506, RAM 508 (which together constitute dataprocessing apparatus 502) which are interconnected and communicate withinput device 510 via bus 512. The program ROM includes code which whenrun by the CPU is operable to execute the method steps. The program codemight alternatively be downloaded from a source remote to the apparatusvia the input device and stored in local storage such as the RAM 508.The RAM is generally used to hold temporary results. The input device510 and/or the data processing apparatus 502 may be implemented inhardware or software or any combination of these. For example, an ASICmay implement the functions of the input device and/or data processingapparatus. In another example, the input device might be a wireless airinterface and the data processing apparatus implemented usingconventional CPU, ROM and RAM. A user interface 514 could be connectedto the data processing apparatus via bus 512 and this interface can thenbe used to enable a user to configure the method, for example to selecta type of music mood required (sad, happy, etc.) which selection mightbe used to establish which musical keys to look for. Store 516 cancontain a list of audio signal identifiers (e.g. data describing thelocations of audio signals) or audio signal files (for example musictracks) together with their musical keys (as determined from prioranalysis, for example by the apparatus). In response to user input or byany other way, the apparatus accesses and analyses audio signals and/orselects audio signals based on one or more determined keys for a purposesuch as compiling a playlist, which playlist is compiled according tothe input information including mood, situation, etc. The apparatus canaccess and analyse audio signals from remote sources to offer tracksaccording to the input information. In another case the apparatus canoutput musical key and audio signal information via output device 518for use by another apparatus or system. The output device can compriseany suitable implementation, including those mentioned above in respectof the input device, for interfacing to physical media and/or networkentities.

The invention may be incorporated within any suitable apparatusconfigured as a dedicated key extraction apparatus or to provide keyextraction features within a host product or application. Examples ofsuitable apparatus include audio Jukebox, Internet radio and playlistgenerators (e.g. for radio station use). Audio Jukeboxes may accessaudio signals using removable media (utilising magnetic tape/disc and/oroptical disc) and/or via networking technologies (local and wide area,including Internet, etc.) by means of wired or wireless interconnection.

The foregoing method and implementation are presented by way of exampleonly and represent a selection of a range of methods and implementationsthat can readily be identified by a person skilled in the art to exploitthe advantages of the present invention.

In the description above and with reference to FIG. 1 there is discloseda method for determining the key of an audio signal such as a musictrack. Portions 106 of the audio signal are analysed 104 to identify 108a musical note and its associated strength 110 within each portion. Somenotes identified in a portion may be ignored 118 to enable notes relatedto the key to be more readily distinguished. A first note is thendetermined 124 from the identified musical notes as a function of theirrespective strengths. From the identified musical notes, at least twofurther notes are selected 128 as a function of the first note. The keyof the audio signal is then determined 130 based on a comparison of therespective strengths of the selected notes.

1. A method for determining the key of an audio signal, the methodcomprising the steps of: for each of a plurality of signal portions,analysing the portion to identify [p v1]a musical note, and where atleast one musical note is identified: determining a strength associatedwith the or each musical note; and generating a data record containingthe identity of the or each musical note, the strength associated withthe or each musical note and the identity of the portion; for each ofthe data records, ignoring the strength associated with an identifiedmusical note where said strength is less than a predetermined fractionof the maximum strength associated with any identified musical notecontained within the data records; determining a first note from theidentified musical notes as a function of their respective strengths;selecting at least a second and a third note from the identified musicalnotes as a function of the first note; and determining the key based ona comparison of the respective strengths of the at least second andthird notes.
 2. A method as claimed in claim 1, wherein each portion isthe same size.
 3. A method as claimed in claim 1, wherein each portionencompasses the same length of time.
 4. A method as claimed in claim 1,wherein the size of the portion is a function of the tempo of the audiosignal.
 5. A method as claimed in claim 1, wherein the portions arecontiguous.
 6. A method as claimed in claim 1, wherein the predeterminedfraction is determined in dependence on the content of the audio signal.7. A method as claimed in claim 1, wherein the predetermined fractionlies in the range of one tenth to one half.
 8. A method as claimed inclaim 7, wherein the predetermined fraction is one seventh.
 9. A methodas claimed in claim 1, wherein the step of analysing the portion toidentify a musical note comprises the steps of: converting the portionto a frequency domain representation; subdividing the frequency domainrepresentation into a plurality of octaves; for each octave containing amaximum amplitude: determining a frequency value at the maximumamplitude; and selecting a note name of a musical scale in dependence onthe frequency value; and identifying a musical note in dependence on thesame note name being selected in more than one octave.
 10. A method asclaimed in claim 9, wherein the conversion of the portion to a frequencydomain representation is performed by means of a Fourier Transform. 11.A method as claimed in claim 9, wherein the musical scale is the EqualTempered Scale.
 12. A method as claimed in claim 1, wherein the step ofdetermining a strength associated with the or each musical notecomprises the steps of: determining the amplitude of each frequencycomponent of the musical note; and summing the amplitudes.
 13. A methodas claimed in claim 1, wherein the step of determining the first notecomprises the steps of: for each identified musical note, summing thestrengths associated with the musical note in the data records; anddetermining the first note to be the identified musical note with themaximum summed strength.
 14. A method as claimed in claim 1, wherein thefirst note is the tonic of the key.
 15. An apparatus for determining thekey of an audio signal, the apparatus comprising: an input deviceoperable to receive a signal; a data processing apparatus operable to:for each of a plurality of signal portions, analyse the portion toidentify [p v2]a musical note, and where at least one musical note isidentified: determine a strength associated with the or each musicalnote; and generate a data record containing the identity of the or eachmusical note, the strength associated with the or each musical note andthe identity of the portion; for each of the data records, ignore thestrength associated with an identified musical note where said strengthis less than a predetermined fraction of the maximum strength associatedwith any identified musical note contained within the data records;determine a first note from the identified musical notes as a functionof their respective strengths; select at least a second and a third notefrom the identified musical notes as a function of the first note; anddetermine the key based on a comparison of the respective strengths ofthe at least second and third notes.
 16. An apparatus as claimed inclaim 15, wherein the predetermined fraction is determined in dependenceon the content of the audio signal.
 17. An apparatus as claimed in claim16, wherein the predetermined fraction lies in the range of one tenth toone half.
 18. An apparatus as claimed in claim 17, wherein thepredetermined fraction is one seventh.
 19. An apparatus as claimed inclaim 15, wherein for each of a plurality of signal portions, to analysethe portion to identify a musical note the data processing apparatus isoperable to: convert the portion to a frequency domain representation;subdivide the frequency domain representation into a plurality ofoctaves; for each octave containing a maximum amplitude determine afrequency value at the maximum amplitude; and select a note name of amusical scale in dependence on the frequency value; and identify amusical note in dependence on the same note name being selected in morethan one octave.
 20. An apparatus as claimed in claim 19, wherein thedata processing apparatus is operable to convert the portion to afrequency domain representation by performing a Fourier Transform. 21.An apparatus as claimed in claim 19, wherein the musical scale is theEqual Tempered Scale.
 22. An apparatus as claimed in claim 15, whereinto determine a strength associated with the or each musical note thedata processing apparatus is operable to: determine the amplitude ofeach frequency component of the musical note; and sum the amplitudes.23. An apparatus as claimed in claim 15, wherein to determine the firstnote the data processing apparatus is operable to: for each identifiedmusical note, sum the strengths associated with the musical note in thedata records; and determine the first note to be the identified musicalnote with the maximum summed strength.
 24. An apparatus as claimed inclaim 15, further comprising an output device operable to send datacorresponding to the key of the audio signal.
 25. A record carriercomprising software operable to carry out the method of claim
 1. 26. Asoftware utility configured for carrying out the method steps as claimedin claim
 1. 27. A jukebox including a data processor, said dataprocessor being directed in its operations by a software utility asclaimed in claim
 26. 28. A method for determining the key of an audiosignal substantially as hereinbefore described and with reference to theaccompanying drawings.
 29. An apparatus for determining the key of anaudio signal substantially as hereinbefore described and with referenceto the accompanying drawings.